Telecommunication system with error messages corresponding to speech recognition errors

ABSTRACT

At least one voice parameter is analyzed during a voice recognition process. If the voice parameter(s) exceed a threshold, a message is issued to the user, which is specifically designed to request observance of the value range that has been predetermined for the voice parameter. The message prompts the user to re-input the command with a correction which has been adjusted to the voice parameter.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is based on and hereby claims priority to GermanApplication No. 199 567 47.6 filed on Nov. 25, 1999, the contents ofwhich are hereby incorporated by reference.

BACKGROUND OF THE INVENTION

In the field of man/machine telecommunications, mainly approaches forevaluating the information content of human language are found, becauseparticularly the spoken word is a very important way in people'severyday lives for communicating targeted information in an easy, rapidand very compact way. Owing to its widespread availability and offamiliarity, the telephone is recognized as the transmission medium forthe spoken word in everyday life. In order to facilitate and automatesimple parts of the exchange of information between man and machine viatelephone, voice recognition methods and apparatuses are being used foraccepting orders in call centers or in telebanking information systemsand order-receiving systems.

Previously known user-independent voice recognition methods and devicesoften differ considerably from people's spontaneous and naturalinterchange which is customary on the telephone. Malfunctions in theform of voice recognition errors are frequent with known systems,because known analysis methods react sensitively to particular featuresof the respective input signals, for example a user's manner ofspeaking. There is therefore a severe increase in the error rate invoice signals transmitted by telephone when, for example, there issevere background noise and when a person speaks very quickly or tooslowly. This may produce virtually unusable results. In order toovercome this problem, it is known to request the user to speak clearlyonce more. An automatic announcement is then generated which may soundas follows: “I didn't understand you, please speak more clearly”.

In order to improve voice recognition while maintaining as far aspossible a natural speech rhythm in human speech, complex methods areproposed for particularly adapting the machine to each individual user,as presented for example in a summary in the book “AnwendungsspezifischeOnline-Anpassung von Hidden-Markov-Modellen in automatischenSpracherkennungssystemen” by Udo Bub, Herbert Utz Verlag, Munich, 1999,the title of which can be translated as “Application-specific onlineadaptation of hidden Markov models in automatic voice recognitionsystems”.

SUMMARY OF THE INVENTION

The object of the present invention is to provide a method, a device anda telecommunications system for user-independent voice recognition, theacceptance by the user being increased by a natural method adapted tohuman use and/or by an apparatus for implementing this method and/or acorresponding telecommunications system.

This object is achieved by a method of analyzing a speech parameterduring voice recognition, a request which is specifically directed atachieving re-compliance with the value range defined for the speechparameter being issued to the user when a threshold value is exceeded bythe speech parameter. Whereas known methods demanded rigid adaptation tothe system by the user so that the acceptance of the user drops,entirely owing to an associated lack of naturalness, a method accordingto the invention analyses, the quality of the incoming voice signal andrequests the user, by a message which is specifically adapted by thespeech parameter, to make a further voice input. The user is thereforeselectively prompted to actively adapt his way of speaking.

Within the scope of a possible implementation of a method according tothe invention, in a preferred embodiment the user can specifically beprovided with the sentence “please speak more softly”, in the same wayas when he is conversing with another person.

In one development, a plurality of threshold values can also be definedfor a speech parameter. When the different threshold values areexceeded, the meaningfulness of the message to be output can beappropriately adapted. Specifically in the case of the correction of thevolume, presented by way of example above, this results in a correctionbandwidth of “softer”, “somewhat louder” to “louder”.

A characteristic variable for the quality of the incoming voice signal,which can also be evaluated as an indication of the quality of the voicerecognition, can be determined by reference to the speech parameter. Asystematic error can also be detected by reference to persistent casesof the threshold values being exceeded. If, for example, such a case isdetected on a transmission channel of a telecommunications systemprovided with a voice recognition apparatus according to the invention,channel measurement can be initiated within the scope of the describedmethod. In this case, it is even possible to provide according to theinvention that the user is requested to use a different telephoneterminal when there are indications of a suspected fault.

A voice recognition apparatus according to the invention may include atleast one device for processing digitized data of a voice signal, aspeech-outputting device, devices for analyzing and monitoring a speechparameter, a device for determining when the speech parameter isexceeded, a device for generating and outputting a notification indigital or analog form, in particular of a speech synthesizing device,the notification being generated as a function of a threshold value forthe speech parameter being exceeded, and a device for transmitting theindication to a user who generates the voice signal.

A telecommunications system according to the invention may include amultiplicity of telephone terminals, converters for digital/analog andanalog/digital conversion and signal conditioning, a connecting line foreach of the telephone terminals, a channel-bundling andchannel-splitting unit, at least one switching office and a voicerecognition device.

The present voice recognition apparatus using a method according to theinvention is explained in more detail below with reference to theassociated drawings, in which:

BRIEF DESCRIPTION OF THE DRAWINGS

These and other objects and advantages of the present invention willbecome more apparent and more readily appreciated from the followingdescription of the preferred embodiments, taken in conjunction with theaccompanying drawings of which:

FIG. 1 is a basic circuit diagram of a telecommunications systemaccording to the invention, and

FIG. 2 is a flowchart of a voice recognition device.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

Reference will now be made in detail to the preferred embodiments of thepresent invention, examples of which are illustrated in the accompanyingdrawings, wherein like reference numerals refer to like elementsthroughout.

FIG. 1 is a schematic view of an apparatus according to the invention inthe form of a telecommunications system which operates according to themethod in accordance with the invention. A telecommunications system isillustrated which comprises a multiplicity of telephone terminals TEGwhich contain converters for digital/analog and analog/digitalconversion A/D, D/A and integrated signal conditioning. The illustratedtelephone terminals TEG can therefore be devices which operate in adigital fashion, inter alia commercially available EURO-ISDN devicesaccording to the Deutsche Telekom standard. The telephone terminals TEGare connected to a connecting line VL for each of the telephoneterminals TEG with channel-bundling and channel-splitting unit KM, KM⁻¹so that the voice signals S are fed in digitized form to a voicerecognition device SEV via a switching office VS on the connecting linesVL.

The voice recognition device SEV is divided into a voice recognitiondevice SE for processing the input signal IN and a speech synthesizingdevice SSV. Examination for errors during the voice recognition iscarried out in the voice recognition device SE (FIG. 2). When an erroroccurs or when an error quotient is reached, an error message FMtogether with specific parameters determined during the voicerecognition is transmitted to an evaluation logic unit. The individualparameters are examined here to determine whether a respective thresholdvalue S is exceeded.

Properties of a voice signal which exert a significant influence on theerror quotient during the voice recognition are represented asparameters. Some examples of such parameters are presented below:

Such a parameter is the volume L of the voice signal. The value of theparameter L can be acquired from the analog and the digital voicesignal. The voice recognition device SEV cannot exert any influence onthe input amplification of the voice signal at a telephone terminal TEG(FIG. 1), as these terminals are, due to the system, independent withfixed properties, both from the downstream transmission path and fromthe requirements of the voice recognition device SEV as a receiver.

The output of an audible message to the user can be made more precise byusing incremental threshold values. For instance when the correspondingthreshold values for the volume L are exceeded, the speaker is requiredto speak more softly, somewhat louder or louder.

A further parameter is the signal-to-noise ratio SNR of a voice signal.If the signal-to-noise ratio SNR in the voice signal IN present at thevoice recognition device SE is too low, the voice cannot be recognizedwithout error. There are in fact several possible ways of automaticallyimproving the signal-to-noise ratio, for example specific digital filtermethods whose filter parameter values are set in accordance with thecurrent case, or else methods such as autocorrelation for subsequentimprovement of the channel transmission properties.

When the threshold value for the signal-to-noise ratio is exceeded, thevolume L can first be checked. If the volume is too low, the speaker isrequested to speak louder even if the threshold value which applies tothe volume has not yet been undershot. As a result, a largersignal-to-noise ratio is established. If the signal-to-noise ratio whichresults from this is still not sufficient or if the volume L is not low,unfavorable circumstances apply, for example the speaker may be speakingin a noisy environment (for example waiting rooms of railway stationsand airports) or the transmission is subject to interference. Thespeaker is then requested, for example, to speak from a differentlocation or a different telephone.

A further important parameter is the speaking speed v, which can be toohigh or too low. The speaking speed v is detected, for example, bymeasuring the phonemes over time, the term phoneme meaning the smallestlinguistic basic unit of a language which distinguishes meaning. Like aperson, a machine can no longer follow speech which is spoken tooquickly and a correspondingly rapid succession of phonemes, as a resultof which the error quotient rises greatly. In particular it is knownthat the detection rate when inputting numbers drops significantly asthe speaking speed increases. On the other hand, in sentence recognitionmethods which process several words or entire sentences at once, anexcessively low speaking speed also creates problems because the systemmust then wait for unusually long periods for the occurrence of an itemof speech which it can process.

When the corresponding threshold values are exceeded, the speaker isrequested to speak more slowly or more quickly.

Spectral properties of the voice signal are also a possible furthersource of an increased error quotient during voice recognition. Thevoice signal which is restricted to a narrow frequency band by thetransmission by telephone has common features in all human speakers,which can be used in speech recognition. Differences may occur hereowing to the microphones used in a particular case. However, because themicrophones used in telephone terminals are always approximately of thesame quality, this influence is negligible in comparison with theinfluence of the angle and the distance of the speaker from themicrophone. From a difference in the volume and in the spectralproperties of the voice signal it is possible to detect that a speakeris not speaking directly into a microphone from a short distance. Forthis reason, a spectral frequency shift Δf is defined as a parameter,the value of the frequency shift Δf being generated by a directionalcharacteristic of the microphone together with an angle of incidence ofthe voice signal on the microphone.

The threshold values of the spectral frequency shift being exceeded thusmeans that the speaker has not positioned the microphone or the receiverof a telephone in front of his mouth. In such a case, the speaker isrequested to position the microphone near to his mouth.

The aforesaid parameters consequently constitute a quality criterion forthe voice signal IN to be recognized. In the embodiment in FIG. 1, thedigitized voice signal IN which is received in the SEV is then inputdirectly into the voice recognition unit SE. Here, an ongoing errorcheck is carried out. If errors occur, the values of the signal-to-noiseratio SNR, spectral frequency shift Δf, speaking speed v and volume Lparameters are supplied to a central unit ZE. Here, threshold valuemeasuring devices SW are arranged which, when a respective thresholdvalue S for the aforesaid parameters is exceeded, output their owncontrol signal to a speech synthesizing device SSV. These steps all takeplace in real time in order to avoid delaying the input method. The sumof these control signals is processed in the speech synthesizing deviceSSV in that a message, which is for example audible, and may also be inmore than one part, is built up from them. If a plurality of thresholdvalues S_(i) are provided, for example for the evaluation of the volumeL, quantization can be determined in the central unit ZE. It is thendetermined, for example, that the input signal for correct detection isonly slightly too soft. The message which is emitted in response to thecorresponding control signal would be, for example: “Please speakslightly louder”.

As illustrated in FIG. 2, the central unit ZE checks the thresholdvalues of all the predetermined parameters before each speech outputbrought about by an error signal. Only the sum result of all the controlsignals which is compiled by a central processing unit CPU within thecentral unit ZE is to the speech synthesizing device SSV and convertedthere into a digital message which can be listened to after analogconversion. If, therefore, the central unit ZE also determines that thespeaker is speaking too quickly, the prepared message to the user is:“Please speak slightly louder and more slowly”.

In contrast to systems according to the prior art, the message which isoutput according to the invention contains an individually adaptedmessage which is matched to the specific case and which leads to animprovement in the voice recognition in a targeted fashion.

In the present exemplary embodiment, the message is transmitted to theuser in an audible form, namely as a synthetically generated sentence (amessage OUT). In order to output the message OUT, a digital signal isgenerated in the speech synthesizing device SSV and connected viaconnecting lines VL with channel-bundling unit KM and channel-splittingunit KM⁻¹ to the corresponding telephone terminal TEG via the switchingoffice VS, in order to reach the user with a specific message afteranalog conversion as S voice signal.

The message OUT can also be processed by the apparatus described abovein some other way in the reverse direction instead of as an audiblemessage. For example, the message can be displayed to the user on thetelephone terminal TEG, for example on a screen telephone or a PC withan integrated telephone or a display.

In contrast to the signal flow illustrated in FIG. 2, the error analysiscan also be separated from the voice recognition unit SE and arrangedalongside it within the voice recognition device SEV. In this way, thecomputing power which is available to the voice recognition device SEcan be used completely for this one task, because the analysis of theresults for errors can be carried out in parallel to this in acontinuous process by examining the parameters mentioned above for casesin which threshold values are exceeded in threshold value measuringdevices SW. In cases in which threshold values are exceeded, it is thenpossible to carry out error analysis selectively so that in theuser-independent system described the entire loading on a system remainsacceptable even with tight threshold values. A further possibility is topredefine input threshold values which are adapted quickly on acase-by-case basis. Thus, overall, reliable convergence with low errorquotients while making a minimum number of requirements of the userand/or of system-internal control operations is achieved.

The invention has been described in detail with particular reference topreferred embodiments thereof and examples, but it will be understoodthat variations and modifications can be effected within the spirit andscope of the invention.

1. A telecommunications system with speech recognition, comprising: amultiplicity of telephone terminals; converters, coupled to at leastsome of said telephone terminals to provide digital/analog andanalog/digital conversion and signal conditioning; channel-bundling andchannel-splitting units coupled to said telephone terminals, via saidconverters when coupled thereto; and at least one switching officecoupled to at least one of said channel-bundling and channel-splittingunits and including a speech recognition device to analyze and monitorspeaking speed in a voice signal received from a user via the at leastone of said channel-bundling and channel-splitting units; a thresholddevice, coupled to said speech recognition device, to determine when thespeaking speed is outside a predetermined range; and a speechsynthesizing device, coupled to said threshold device, to generate anindividually adapted notification in at least one of digital and analogform when the speaking speed is outside the predetermined range and tooutput the individually adapted notification via the at least one ofsaid channel-bundling and channel-splitting units.
 2. A speechrecognition method, comprising: determining during speech recognitionwhether an error has occurred in the speech recognition; analyzing, if acorresponding error is determined, at least one speech parameter;determining compliance by monitoring speaking speed within the inputvoice signal and comparing the speaking speed with a threshold valuerange defined by at least one of a maximum threshold value and a minimumthreshold value; and issuing to a user a message, individually adaptedto provide compliance with the threshold value range specified for thespeaking speed, to prompt re-input by the user with correction adaptedto the speaking speed.
 3. The method as claimed in claim 2, wherein saiddetermining compliance includes monitoring a signal-to-noise ratio of aninput voice signal.
 4. The method as claimed in claim 3, wherein saiddetermining compliance includes monitoring a specific range of afrequency spectrum of the input voice signal.
 5. The method as claimedin claim 2, wherein said determining compliance includes monitoring thevolume of the input voice signal.
 6. The method as claimed in claim 2,wherein said determining compliance includes monitoring a plurality ofspeech parameters simultaneously.
 7. The method as claimed in claim 6,wherein each of the speech parameters is monitored in a digitized voicesignal.
 8. The method as claimed in claim 2, wherein said issuing of therequest is performed audibly as a spoken short record played by a speechsynthesizing device.
 9. The method as claimed in claim 2, wherein saiddetermining compliance uses more than one threshold value to selectamong a plurality of messages to be issued to the user.
 10. The methodas claimed in claim 9, wherein said determining compliance of thespeaking speed is performed in real time from input and processing of avoice signal.
 11. A speech recognition apparatus, comprising: a speechrecognition device to analyze and monitor speaking speed in a voicesignal received from a user; a threshold device, coupled to said speechrecognition device, to determine when the speaking speed is outside apredetermined range; a speech synthesizing device, coupled to saidthreshold device, to generate and output an individually adaptednotification in at least one of digital and analog form when thespeaking speed is outside the predetermined range, and an output device,coupled to said speech synthesizing device, to transmit the individuallyadapted notification to the user who generated the voice signal.
 12. Theapparatus as claimed in claim 11, wherein said speech recognition andoutput devices are coupled to at least one of a digital telephone systemand an analog telephone system to receive the voice signal from and totransfer the individually adapted notification to the user who generatedthe voice signal.